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ABSTRACT 



System and method for examining a file (1) associated with 
a digital computer (2) to determine whether a computer virus 
is present within the file (1). The file (1) contains at least one 
numbered sector. When the file (1) is examined for an initial 
time, the file (1) is scanned by an antivirus module (3, 5). At 
that time, the numbers of the sectors being scanned and a 
hash value for each scanned sector are stored into a critical 
sector file (4). The hash values can be calculated by an 
antivirus accelerator module (5). When the file (1) is exam- 
ined a subsequent time, all of the file (1) sectors that were 
scanned the initial time are examined by the antivirus 
accelerator module (5). Each of these sectors again has its 
hash value calculated and compared with the hash value of 
the corresponding sector as stored within the critical sector 
file (4). When any calculated hash value fails to match a 
corresponding stored hash value for any sector, the antivirus 
scan module (3) is commanded to rescan the entire file (1) 

11 Claims, 3 Drawing Sheets 
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ANTIVIRUS ACCELERATOR area (4). When the file (1) is examined a subsequent time, all 

of the file (1) sectors that were scanned the initial time the 

TECHNICAL FIELD file (1) was examined are examined by the antivirus accel- 

^ . . . , . r , t . . erator module 5). A hash value for each file (1) sector so 

Tins invention pertains to the field of detecting viruses in 5 examined fa computed and compared with the hash value for 

computer software. the c^sp^ding sector that is stored within said first 

BACKGROUND ART storage area (4). When any computed hash value fails to 

match a corresponding stored hash value for any sector, the 

There are several techniques of the prior art that have been entire file (1) is rescanned by the antivirus scan module (3). 
used to increase the speed of scanning computer files by 

antivirus software. BRIEF DESCRIPTION OF THE DRAWINGS 

For example, the software product known as Norton and mher more detai i ed and specific objects and 

Antivirus (NAV) manufactured by Symantec Corporation features of the present invention are more fully disclosed in 

runs continuously in the background of a processor. If a file the followmg specification, reference being had to the 

is modified, it is automatically rescanned by NAV. The NAV 15 aC c 0 mpanying drawings, in which: 

server-based antivirus software keeps a cache of files that . . t _ , , , .„„.,.• „ . 

, , j * r • c \ • ,u FIG. 1 is a system block diagram illustrating a preferred 

have been scanned and certified clean (virus-free) since the n / tUa ;«„Lt™ 

. , fi . TC , £1 . \ „ , , tU embodiment of the present invention, 

last reboot of the server. If such a file is later accessed by the ; t 

user, NAV does not rescan the file, since NAV knows that the . HG; 2 is a simplified flow diagram illustrating the present 

file is already clean. Such a technique works well for servers, 20 invention. 

because servers are rarely rebooted and the same files are FIG. 3 is a detailed flow diagram illustrating a preferred 

used over and over again. However, on desktop (client) embodiment of the present invention, 
computers that are reset frequently, such a cache cannot be 

maintained for long periods, because desktop computers are DETAILED DESCRIPTION OF THE 

rebooted frequently. Furthermore, desktop computers typi- 25 PREFERRED EMBODIMENTS 

cally contain a relatively low amount of memory. ^ a trend f or an tivirus scanning to become more 

In a second technique of the prior art, desktop based CPU-bound and less IO-bound. This is because of the 

antivirus programs, such as IBM's Antivirus, store hash data popularity of CPU intensive antivirus techniques such as 

for each program on the hard drive to speed up scanning emulation. Because of this trend, it is advantageous to scan 

operations. Once a file is scanned, a hash value (or simply 30 files once and to store relevant information about the files, 

"hash") of the contents of the file is stored in a database. The including a hash value of the file, in a database. The next 

hash value is a contraction of the file contents created by a time the file is scanned, its hash value is looked up in the 

hash function, which may or may not be specifically tailored database and matched against the current hash value for that 

to the type of the file. Hash functions are described in file. If the hash values match, the file need not be rescanned. 

Schneier, Bruce, Applied Cryptography 2d ed. (John Wiley 35 This is an effective way to eliminate redundant scanning for 

& Sons, Inc.), Chapter 18, pp. 429-460, U.S.A. at least some machines, including servers. However, this 

A hash function is a many-to-one function, i.e., more than technique has two major flaws: 

one file configuration can have the same hash value, l. Computing a hash value for the entire file may take 

although this is highly unlikely. In this prior art technique, longer than an actual antivirus scan for that file, particularly 

during subsequent scans of the file, the hash of the file is first with larger files (such as documents and spreadsheets) that 

computed by the antivirus software, and if the computed may harbor viruses. 

hash matches the hash stored in the database, the file is 2. If one wishes to compute a hash value for just part of 
certified clean by the antivirus software without the neces- t h e gi e ^ order t0 speed performance, one has to specifically 
sity for a rescan. This is possible because a match shows, ^ design parsing and hashing code for each of the major file 
with a high degree of certainty, that the file has not been formats being scanned. For example, NAV currently con- 
modified. This technique eliminates the need for costly ta j ns hashing code for .com and .exe files. For DOS .exe 
CPU-intensive rescans of the file. files, NAV computes a hash value from the entry point and 

Currently, the prior art techniques either take a hash of the header, since this is the most likely location of a viral 

entire file or specifically tailor their hash to critical areas of 50 infection. However, Word for Windows document files (in 

the file based upon the internal file format. If these critical the OLE and .doc formats) do not have an entry point per se. 

areas change, there is a possibility of viral infection. If the An antivirus engineer would have to build another parser 

areas do not change, the likelihood of viral infection is and hasher for OLE and .doc file formats to properly hash 

reduced and the file is not rescanned. relevant sections of the file to check for viruses. To hash for 

A second company that has a technology for hashing files 55 Excel viruses > one would have t0 build vel another parser 

on a desktop computer and rescanning them only if the hash and hasher. A parser is first needed, because the parser can 

values have changed is Sophos Ltd, of the United Kingdom. distinguish between critical portions of a file, e.g., distin- 
guish between executable code and data. After the parser has 

DISCLOSURE OF INVENTION determined what are the critical portions of the file for 

The present invention is a computer-based apparatus and 60 Ptoses of antivirus protection, a hasher can be built to 

method for examining files (1) associated with a digital create the hasn value based u P on the cntical P° rtl0ns of lhe 

computer (2) to determine whether a computer virus is " le ' 

present within said file (1). The file (1) contains at least one The present invention overcomes the disadvantages of the 

numbered sector. An initial time that the file (1) is examined, prior art, by offering the following: 

the file (1) is scanned by an antivirus module (3, 5). The 65 1. A technique that yields the security of a full file hash 

number of each file sector that is scanned and a hash value while requiring a hash to be taken on only a minimal set 

of each sector that is scanned are stored into a first storage of sectors from the file in question. 
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2. A technique which does not require additional pro- 
gramming of a parser and hasher every time a new 
vims-hosting file format (such as .com, .exe, .doc, .xls, 
PowerPoint, etc.) is released. 

The operation of the present invention will now be 5 
described in conjunction with the Figures. A file 1 is to be 
examined to determine whether or not it contains a virus. 
File 1 is associated with a digital computer 2. FIG. 1 
illustrates file 1 as being within computer 2, e.g., file 1 
resides within RAM (random access memory) 10 within J0 
computer 2. File 1 could originally have been on a hard disk, 
floppy disk, or any other computer readable medium, and 
could be (partially or totally) brought into RAM 10 before 
it is acted upon by the antivirus modules 3, 5. 

Antivirus scan module 3 can be a conventional antivirus 
product such as Norton Antivirus (NAV). FIG. 1 illustrates 15 
a separate antivirus accelerator module 5 as performing most 
of the tasks of the present invention. Alternatively, modules 
3 and 5 could be combined into one module and just as 
readily perform the tasks of the present invention; or many 
or most of these features could be grafted onto module 3. 20 

Modules 3 and 5 are typically embodied as computer 
programs, executable by a processor 9 within computer 2. 
Alternatively, modules 3 and 5 could be firmware and/or 
hardware modules or any combination of software, 
firmware, and hardware. 25 

File 1 is divided into sectors. There could be just one 
sector. FIG. 1 illustrates file 1 as having an integral number 
J of sectors. 

The apparatus of the present invention operates differently 
depending upon whether file 1 is being examined an initial 30 
time or a subsequent time. 

As used in the present specification and claims, "exam- 
ined an initial time" means: 

1. File 1 is being examined for the very first time ever; 

2. File 1 is being re-examined after it has been determined 
that the contents of file 1 have changed; 

3. A virus definition within antivirus scan module 3 has 
changed; or 

4. An antivirus scanning engine within antivirus scan 
module 3 has changed. 

As used in the present specification and claims, "exam- 
ined a subsequent time" means that file 1 is being examined 
other than for an initial time, as "initial time" has been 
defined above. 45 

The information as to whether file 1 is being scanned an 
initial time can be conveyed to antivirus accelerator module 
5 by means of a flag set within initial examination register 
means 6 stored within computer 2 (step 21). As used in this 
specification and claims, a "register means" is any device, 50 
module, or technique used to store information that changes 
over time. Thus, "register means" includes hardware, 
software, and/or firmware registers, stacks, flags, automata, 
indication bits, etc. 

The following steps are performed when register means 6 55 
informs module 5 that file 1 is being scanned for an initial 
time: 

1. The contents of critical sector file 4 are set to zero (step 
30). File 4 is in any storage area separate from file 1, 
and is typically located in RAM 10 to maximize speed, §q 

2. Antivirus scan module 3 is invoked to scan file 1 in the 
normal manner (step 22). Depending upon the scanning 
engines within module 3, less than all of the sectors of 
file 1 may be scanned, or all the sectors may be 
scanned. 65 

3. During the scanning of file 1, module 5 places into 
critical sector file 4 the number of each of the sectors 



35 



that is scanned (step 23). Alternative to module 5 
performing this task as illustrated in FIG. 1, this can be 
done automatically every time a sector is read from file 
1 via books attached to read and seek functions of the 
engines within antivirus scan module 3. As each sector 
is operated upon by module 3, module 5 calculates the 
hash value for that sector, and inserts the hash value 
into file 4 (also step 23). FIG. 1 illustrates the special 
case where four sectors are scanned, namely sectors 1, 
2, 3, and J. 

4. Module 5 determines the size of file 1 and places this 
value into file 4 (step 31). 

5. If a virus is detected by module 3 (step 32), module 3 
typically informs the user, by sending a message to the 
user via user interface 7, e.g., a monitor (step 33). If, on 
the other hand, module 3 does not detect a virus in file 
1 (step 32), module 5 causes file 4 to be moved to a 
relatively more permanent location, such as hard disk 
11, where file 4 becomes known as remote critical 
sector file 8 (step 34). 

6. Module 5 updates the contents of register means 6 to 
indicate that file 1 has been examined (step 35). 

A subsequent time that file 1 is examined, module 5 
checks the contents within register means 6 (step 21). This 
time, module 5 is informed that file 1 has already been 
examined an initial time. Therefore, the following set of 
steps are performed: 

1. Module 5 causes remote critical sector file 8 to be 
moved from disk 11 to RAM 10, where file 8 is again 
known as critical sector file 4 (step 36). 

2. Module 5 determines the size of file 1, and compares 
this determined size versus the size of file 1 that has 
been previously stored in file 4 (step 37). If these two 
numbers are different, then module 5 concludes that the 
contents of file 1 have changed in some way, and 
commands module 3 to rescan the entire file 1 for 
viruses (step 38), commencing with step 30, as 
described above. 

3. If the determined size of file 1 equals the prestored 
value of the size of file 1 as stored within file 4, then 
module 5 determines from file 4 what file 1 sectors have 
previously been scanned. Module 5 then computes the 
hash values for each of those prescanned sectors (step 

25) , and compares the computed hash values against 
the prestored (in file 4) hash values, respectively (step 

26) . 

4. If all of the recently computed hash values are respec- 
tively identical to all of the prestored hash values, then 
module 5 makes the determination that file 1 is 
"unchanged in a way that could allow for a viral 
infection". This determination can be sent to the user 
via user interface 7 (step 39). 

5. If any computed hash value fails to match its corre- 
sponding prestored hash value for that sector, then 
module 5 commands module 3 to rescan the entire file 
1 for viruses (step 27). If a virus is detected (step 40), 
the user can be informed as before (step 41). If module 
3 fails to detect the presence of a virus within file 1 
(step 40), the most recently computed hash values for 
the scanned sectors are inserted into file 4 (step 42). 
Then file 4 is moved to disk 11 where it becomes file 
8 (step 43). 

Any time any change is made to antivirus scan module 3, 
such as putting in new virus definitions or changing the 
scanning engines, file 1 must be rescan ned for viruses. In 
such a case, the contents of file 8 are deemed to be invalid, 
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and register means 6 is updated to indicate that file 1 has not of file sectors, to determine whether a computer virus is 

been initially examined. Thus, step 30 will be entered, as present within said file, the method comprising the steps of: 

described above. when the file is being examined an initial time: 

The present invention overcomes the two flaws of the scanning selected file sectors of the file by an antivirus 

prior art, for the following reasons: 5 module, the selected file sectors being fewer than all 

1. With respect to scanning and hashing a minimal set of of the file sectors and defining a critical fixed set of 
sectors in file 1, the present invention calculates hash sectors; and 

values for only those sectors actually retrieved by storing into a first storage area the number of each file 

module 5. Module 3 is deterministic, i.e., it always acts sector that is scanned and a hash value of each sector 

in the same way with the same file 1. Therefore, module 1Q that is scanned; and 

3 always scans the same set of sectors, unless file 1 when the file is being examined a subsequent time: 

changes in length or the contents of those sectors have computing a hash value only for each file sector in the 

changed in some way. If a sector which is not in the set critical fixed set of sectors; 

of sectors retrieved from file 8 has changed, module 3 comparing each computed hash value with the hash 

is oblivious to that fact. But that is of no import to the value stored within said first storage area for the 

present invention, because module 3 never scanned that 15 corresponding sector; and 

sector to begin with. Module 3 will always detect all of rescanning the file by the antivirus module when any 

the viruses that it currently knows how to detect, by computed hash value fails to match a corresponding 

looking only at the critical fixed set of sectors that has stored hash value for any sector in the a critical fixed 

been stored in files 4 and 8. set of sectors. 

For example, let us assume that the scanning engines 20 2. The method of claim 1 comprising the additional step 

within module 3 virus-scan sectors 1, 3, and 10 from a file of setting the entire contents of the first storage area to zero 

1 of size 10K. If a change were made to either sectors 1, 3, before the file is examined for the initial time, 

or 10, module 3 would notice the change, since it reads and 3. The method of claim 1 wherein, during the initial 

scans these three sectors. Thus, file 1 would definitely need examination of the file, the sector numbers are automatically 

to be rescanned. However, if a change were made to another 25 read into the first storage area by means of hooks associated 

sector, say sector 5, and the size of file 1 did not change, with engines of the antivirus module, 

none of the scanning engines would have detected nor cared 4. The method of claim 1 wherein, during the initial 

about this change. This would be outside the set of sectors examination of the file, the antivirus module determines the 

that must be examined to detect a virus according to the size of the file and stores said size within the first storage 

current scanning engines with their current set of data. A 30 area - 

new version of module 3 might check for sectors 1,3, 5, and 5. The method of claim 4 wherein, during subsequent 

10. At that time, file 1 would be scanned anew, and a virus examinations of the file, the size of the file is computed, and 

in sector 5 would be detected. when the computed file size differs from the file size stored 

2. With respect to the prior art flaw of requiring additional in the first storage area, the entire file is scanned for viruses 
programming of parsers and hashers to support new file 35 by the antivirus module. 

formats, the antivirus accelerator module 5 of the 6 - melhod of claim *> wherein th / ste P s associated 

present invention automatically hashes all sectors ^ a subsequent examination of the file are performed 

scanned by module 3 in the same way, regardless of the whenever the size of the file has not changed since the initial 

contents of the sectors. No new parser or hasher coding examination of the file, 

needs to be performed and incorporated into module 5 40 7 - ^ melhod of claun *> wherein, when the computed 

to support new file formats. Once a new scanning hash values respectively match all of the corresponding 

engine is incorporated into module 3, file 1 is scanned slored hasn values ' the antivirus module declares that the file 

anew, as discussed above. From this point on, the old * unchanged in a way that could allow for a viral infection, 

scanning engines scan the original set of sectors, for 8 * ™ e method of claim therein, when the antivirus 

example 1, 3, and 10, and the new scanning engine 45 module fails 10 detect a virus m the file durin S the 1Dltial 

scans new sectors, say 5 and 6. Critical sector file 4 then examination of the file, the contents of the first storage area 

contains information for sectors 1, 3, 5, 6, and 10, and are moved 10 a second stora S e more Permanent than the 

the invention works as before. first storage area. 

Using prior art techniques, the antivirus developer would 9 n * method of claim 8 > wherein > du ™g a subsequent 

have to actually build a parser module to specifically 50 examination of the file, hash values that are computed by the 

traverse the file having the new format, then hash the antivirus module are compared with contents of said second 

information in a way that is specifically attuned to that storage area. 

particular file format, an expeasive and time consuming 10 ^ melhod of claim 8 > wr * re * n > when the antlvirus 

process. With the present invention, once the developer has module * changed, the contents of the second storage area 

built a new scan module 3, the hashing of the relevant 55 arc de emed to be invalid and the file is reexamined for 

sectors is done automatically whenever the relevant sectors viruses. 

are reloaded into file 4. n Apparatus for speeding detection of computer viruses, 

The above description is included to illustrate the opera- the a PP a ™rus comprising: 

tion of the preferred embodiments, and is not meant to limit a firs t file associated with a digital computer and contain- 

the scope of the invention. The scope of the invention is to 60 m 8 a plurality of numbered sectors; 

be limited only by the following claims. From the above coupled to the first file, an antivirus scan module adapted 

discussion, many variations will be apparent to one skilled to detect the presence of computer viruses only within 

in the art that would yet be encompassed by the spirit and selected file sectors of the first file, the selected files 

scope of the present invention. sectors being fewer than all of the file sectors and 

What is claimed is: 65 defining a critical fixed set of sectors; 

1. A computer-based method for examining a file associ- coupled to the antivirus scan module, an antivirus accel- 

ated with a digital computer, said file containing a plurality erator module; 
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coupled to the antivirus accelerator module, a register 
means indicating whether the first file has already been 
scanned by the antivirus scan module; and 

a critical sectors file coupled to a module from the group 
of modules consisting of the antvims accelerator mod- 5 
ule and the antivirus scan module, said critical sectors 



8 

file containing the size of the first file, the number of the 
sectors from the first file scanned by the antivirus scan 
module, and a hash value only for each sector in the 
critical fixed set of sectors. 

***** 
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